IMPORTING LIBRARIES

Importing the data sets

EDA (Exploratory Data Analysis)

SHAPE OF BOTH DATASETS

Observations:

Checking information on the train dataset

Observations:

Checking the percentage of missing data in each column

Observations:

Making a copy of both dataset before restructuring them

After carefully reviewing the features that each dataset posses, I decided to drop some of the columns that dont have missing values but that are not necessary because their condition is already described by another feature

Dropping the rows with Nan values

Observations:: We have removed all the missing data from our datasets, now we are going to check if every element on each features of our dataset is the same in both test and train data.

DATA DESCRIPTION

Observations:

DATA VISUALIZATION

Observations:

Observations:

Observations:

Bivariate graphs

observations:

Observation:

Observation:

Observations:

Data preparation for modeling

CREATING VALIDATION DATA AND TRAINING TO TEST THE MODEL BEFORE EVALUATING THE TEST DATASET

Bulding the models

MODEL 1 DECISION TREE

Calculating precision of the model

Observations:

Hyperparameter tuning

MODEL 1 TUNNED

Observations:

MODEL 2 RANDOM FOREST

Observations:

Hyperparameter tuning

RANDOM FOREST HYPERPARAMETER TUNING

Observations:

IMPORTANCE FEATURES THAT AFFECT THE RESULTS

CREATING DATAFRAME WITH THE ID AND PREDICTION RESULTS

OBSERVATIONS: The final model design would be the one with the greatest performance which in our case is the random forest with the tuned parameters, with a 86% accuracy. Different from the other models the features with the most influence in this model are: GrLvArea,OverallQual, GarageCars, YearBuilt,LotArea,FullBath,ExterQual_TA,YearRemodAdd,Fireplaces,OpenPorchSF.